Data Introduction

Column

Motivation and Background

Working Women: A study on female participation in the labor force around the world
a byline that describes my motivation

Some of the research questions I will explore include:

  • How do female participation rates vary from country to country?

  • What variables in the data set correlate with female participation rates?

  • Do other variables, such as life expectancy and region, relate to each other?

The data used in this analysis is from The World Bank. I used the gender section to find variables related to gender differences in working levels.

Variable Explanations

There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.

  • Country: the country of the observation
    -Not all countries are represented, and some have more data than others

  • Year: the year of the observation
    -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.
    -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.

  • Region: the region of the country
    -There are 7 regions

  • Income Level: the income level of the country
    -There are 4 income levels
    -According to the World Bank, “the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year.” More about the income classification can be found here.

  • Male Life Expectancy: life expectancy at birth, male (years)

  • Female Life Expectancy: life expectancy at birth, female (years)

  • Fertility Rate: Number of children born per woman on average (births per woman)

  • Female Labor: Female labor force as a proportion of the total labor force (percentage)
    -Shows how active women are in relation to others in the labor force -The labor force is made up of people 15 or older that supply labor

  • Female Participation: Rate of women ages 15 or older that supply labor (percentage)

  • Male Participation: Rate of men ages 15 or older that supply labor (percentage)

Analysis

In both the summary statistics and correlation tabs, only data from 2020 will be used.

Summary Statistics

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top.

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.

  • Female life expectancy is higher on average than male life expectancy

  • The male participation rate tends to be higher than the female participation rate

  • Both the female percentage of labor force and the female participation rate have a large amount of variation in the data

Correlation Plot

The correlation plot shows relationships between the numerical variables in the data set.

  • Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

  • Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

  • Female participation and female labor are very strongly positively correlated as well. This mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.

  • Female participation is not heavily correlated with any of the other values in the data set.

In the next few tabs, I will explore the relationship between female participation and region and income.

Column

Summary Statistics


Categorical Variables

 Region                          Income Group            
 East Asia & Pacific       :37   Low income         :28  
 Europe & Central Asia     :58   Lower middle income:54  
 Latin America & Caribbean :42   Upper middle income:54  
 Middle East & North Africa:21   High income        :80  
 North America             : 3   NA's               : 1  
 South Asia                : 8                           
 Sub-Saharan Africa        :48                           

Numerical Variables

Variable Min Mean Max Missing Values (%)
Male Life Expectancy 51.45 70.57 82.9 8.29
Female Life Expectancy 55.88 75.47 88 8.29
Fertility Rate 0.84 2.57 6.74 7.83
Female Percentage of Labor Force 8.27 41.17 54.91 13.82
Male Participation Rate 44.24 69.2 95.44 13.82
Female Participation Rate 6.08 49.69 83.05 13.82

Correlation

Current Exploration

Column

Data Table


The table below shows a glimpse of the data and fields from 2020, the most recent complete year of reporting. Throughout the dashboard, this will be the year I focus on.

Female Particpation Map

Female Participation

---
title: "Working Women"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: embed
    theme:
      bootswatch: zephyr
---

```{r setup, include=FALSE}
library(flexdashboard)
```

```{r imports}
setwd("C:/Users/clari/Documents/School/Classes/MTH 209/final project")

library(pacman)

p_load(tidyverse, ggplot2, RColorBrewer, DataExplorer, vtable, scales)

gender <- read_csv("data/gender.csv", skip = 4)
colnames(gender) <- mapply(gsub, 'X', '', colnames(gender), USE.NAMES = FALSE)

gender <- gender %>% rename(country_code = "Country Code", country_name = "Country Name", ind_code = "Indicator Code", ind_name = "Indicator Name")

region_income <- read_csv("data/region_income_level.csv") 
region_income <- region_income %>% rename(country_code = "Country Code", region = "Region", income_group = "IncomeGroup") %>% 
  select(country_code, region, income_group)

region_income <- region_income %>% subset(!is.na(country_code)) %>% subset(nchar(country_code) == 3)

indicator_names = c("m_life_exp","f_life_exp", "fertility_rate", "female_labor", "male_participation", "female_participation")

###Trying for all countries - need  to make function

df <- gender %>% mutate(indicator = case_when(
  ind_code == "SP.DYN.LE00.MA.IN" ~ indicator_names[1],
  ind_code == "SP.DYN.LE00.FE.IN" ~ indicator_names[2],
  ind_code == "SP.DYN.TFRT.IN" ~ indicator_names[3],
  ind_code == "SL.TLF.TOTL.FE.ZS" ~ indicator_names[4],
  ind_code == "SL.TLF.CACT.MA.ZS" ~ indicator_names[5],
  ind_code == "SL.TLF.CACT.FE.ZS" ~ indicator_names[6]
  
))

df <- subset(df, !is.na(indicator))
df <- df %>% select(-c(ind_name, ind_code)) %>% select("indicator", "country_name", "country_code", everything())

df <- data.frame(country = rep(unique(df$country_name), 62),
                  country_code = rep(unique(df$country_code), 62),
                  year = rep(1960:2021, each = length(unique(df$country_name))),
                  m_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[1], 4:65]))),
                  f_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[2], 4:65]))),
                  fertility_rate = unname(unlist(as.vector(df[df$indicator==indicator_names[3], 4:65]))),
                  female_labor = unname(unlist(as.vector(df[df$indicator==indicator_names[4], 4:65]))),
                  male_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[5], 4:65]))),
                  female_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[6], 4:65])))
)

df <- df %>% left_join(region_income, by = "country_code") %>% 
  select(country, year, country_code, region, income_group, everything())

df <- df %>% mutate_if(is.character, as.factor)

df$income_group <- factor(df$income_group, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))
```

Data Introduction
=======================================================================

Column {.tabset data-width=600 .tabset-fade}
-----------------------------------------------------------------------

### Motivation and Background

<font size="5"> **Working Women: A study on female participation in the labor force around the world**</font>  
a byline that describes my motivation

Some of the research questions I will explore include:

- How do female participation rates vary from country to country?

- What variables in the data set correlate with female participation rates?

- Do other variables, such as life expectancy and region, relate to each other?

The data used in this analysis is from [The World Bank](https://genderdata.worldbank.org/). I used the gender section to find variables related to gender differences in working levels.


### Variable Explanations

There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.

- **Country**: the country of the observation  
  -Not all countries are represented, and some have more data than others  

- **Year**: the year of the observation  
  -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.  
  -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.  

- **Region**: the region of the country  
  -There are 7 regions

- **Income Level**: the income level of the country  
  -There are 4 income levels  
  -According to the World Bank, "the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year." More about the income classification can be found [here](https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023#).  
   
- **Male Life Expectancy**: life expectancy at birth, male (years)

- **Female Life Expectancy**: life expectancy at birth, female (years)

- **Fertility Rate**: Number of children born per woman on average (births per woman)

- **Female Labor**: Female labor force as a proportion of the total labor force (percentage)  
  -Shows how active women are in relation to others in the labor force
  -The labor force is made up of people 15 or older that supply labor
  
- **Female Participation**: Rate of women ages 15 or older that supply labor (percentage)

- **Male Participation**: Rate of men ages 15 or older that supply labor (percentage)

### Analysis

In both the summary statistics and correlation tabs, only data from 2020 will be used. 

**Summary Statistics**

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top. 

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.   

- Female life expectancy is higher on average than male life expectancy  

- The male participation rate tends to be higher than the female participation rate  

- Both the female percentage of labor force and the female participation rate have a large amount of variation in the data  

**Correlation Plot**

The correlation plot shows relationships between the numerical variables in the data set.  

- Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

- Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

- Female participation and female labor are very strongly positively correlated as well. This mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.

- Female participation is not heavily correlated with any of the other values in the data set. 

In the next few tabs, I will explore the relationship between female participation and region and income.  

Column {.tabset data-width=400}
-----------------------------------------------------------------------

### Summary Statistics
<br>
<span style="color: light grey;">Categorical Variables</span>

``` {r summary_cat} 
data_2020 <- df %>% subset(year == "2020") %>% select(-c("year")) %>% subset(!is.na(region))

region_income_table <- summary(data_2020 %>% select(region, income_group))
colnames(region_income_table) <- c("Region", "Income Group")
region_income_table
```

<span style="color: light grey;">Numerical Variables</span>

``` {r summary_num}
labs <- c('Male Life Expectancy',
          'Female Life Expectancy',
          'Fertility Rate',
          'Female Percentage of Labor Force',
          'Male Participation Rate',
          'Female Participation Rate')

st(data_2020 %>% select(-c("region", "income_group", "country", "country_code")),
         summ=c('min(x)',
                'mean(x)',
                'max(x)',
                'propNA(x)*100'),
         summ.names = c('Min',
                        'Mean',
                        'Max',
                        'Missing Values (%)'),
         title = "",
         digits = 2,
         labels = labs)
```

### Correlation
``` {r correlation}
corr <- data_2020 %>% select(-c("region", "income_group", "country", "country_code"))

plot_correlation(corr, cor_args = list("use" = "complete.obs"))
```

Current Exploration
=======================================================================

Column {.tabset}
----------------------------------------------------------------------
### Data Table

<br>
The table below shows a glimpse of the data and fields from 2020, the most recent complete year of reporting. Throughout the dashboard, this will be the year I focus on.
<br>

``` {r view}
DT::datatable(df %>% filter(year == "2020", !is.na(region))) %>%
    DT::formatRound(columns=c("female_labor", "male_participation", "female_participation"), digits=3)
```

### Female Particpation Map

<p align="left"><iframe src="https://public.tableau.com/views/maps_16696760316400/MapbyFParticipation?:language=en-US&:display_count=n&:origin=viz_share_link&:showVizHome=no&:embed=true" width="600" height="400"></iframe></p>


Female Participation
=======================================================================